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Abstract. Modular structure is pervasive in many complex networks of interactions 
observed in natural, social and technological sciences. Its study sheds light on the 
relation between the structure and function of complex systems. Generally speaking, 
modules are islands of highly connected nodes separated by a relatively small number of 
links. Every module can have contributions of links from any node in the network. The 
challenge is to disentangle these contributions to understand how the modular structure 
is built. The main problem is that the analysis of a certain partition into modules 
involves, in principle, as many data as number of modules times number of nodes. To 
confront this challenge, here we first define the contribution matrix, the mathematical 
object containing all the information about the partition of interest, and after, we 
use a Truncated Singular Value Decomposition to extract the best representation of 
this matrix in a plane. The analysis of this projection allow us to scrutinize the 
skeleton of the modular structure, revealing the structure of individual modules and 
their interrelations. 
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1. Introduction 

The concept of modular structure in real complex networks [1] is revolutionizing the 
understanding of the evolution of complex systems [2]. Many efforts have been devoted 
to its automatic detection [3j EJ |5], however very little is known yet about the actual 
skeleton of the detected modules that build the network. This skeleton promises 
to be relevant to understand why physical processes in complex networks, such as 
synchronization |6j, present emergent phenomena that are affected by the existence 
of topological barriers between modules. We still miss fundamental tools to anticipate 
these phenomena from a topological perspective. The current work is intended to provide 
network scientists with novel tools to screen the modular structure. The comprehension 
of modular structure in networks necessarily demands the analysis of the contribution 
of each one of its constituents (nodes) to the modules. Recently, Guimera et al. E] 
advanced on this issue proposing two descriptors to characterize the modular structure: 
the z-score (a measure of the number of standard deviations a data point is from 
the mean of a data set) of the internal degree of each node in its module, and the 
participation coefficient (P) defined as how the node is positioned in its own module 
and with respect to other modules. Given a certain partition, the plot of nodes in the 
z—P plane admits an heuristic tagging of nodes' role. The success of this representation 
relies on a consistent interpretation of topological roles of nodes according to the specific 
data analyzed. 

Here we introduce a formalism to reveal the characteristics of networks at the 
topological mesocale, where the representation of the network is viewed as a set of 
interconnected modules. We propose a method, based on linear projection theory, 
to study the modular structure in networks that enables a systematic analysis and 
elucidation of its skeleton. First, we construct a matrix containing all the information 
about the modular structure, and second, we find an optimal dimensional reduction of 
the information contained in it. In particular, we present the optimal mapping of the 
information of the modular structure (in the sense of least squares) in a two-dimensional 
space. The method has been applied to synthetic and real networks. The statistical 
analysis of the geometrical projections allow to characterize the structure of individual 
modules and their interrelations in a unified framework. 

The paper is structured as follows. In section 2, we present the motivation of the 
method and the main findings to interpret the outcome. In section 3, the method is 
illustrated with synthetic networks whose structure is controlled. Finally, in section 4, 
the method is tested in real networks and an explanation of the results is offered. 

2. Projection of the modular structure 

A complex network (weighted or unweighted, directed or undirected) can be represented 
by its graph matrix W, whose elements are the weights of the connections from any 
node i to any node j. Assuming that a certain partition of the network into modules is 
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available, we plan to analyze this coarse grained structure. Note that the partition can 
be obtained by any method, and that the method we propose based on modularity [3] is 
a possibility The main object of our analysis is the Contribution matrix C, of N nodes 
to M modules. The rows of C correspond to nodes, and the columns to modules. The 
analysis of this matrix is the focus of our research. The elements Ci a are the number 
of links that node i dedicates to module a, and can be easily obtained as the matrix 
multiplication between Wij and the partition matrix S: 

N 

C ia = '£W ij S ja (1) 

i=i 

where Sj a = 1 if node j belongs to module a, and Sj a = otherwise. The goal is 
to reveal the structure of individual modules, and their interrelations, from the matrix 
C. To this end, we propose to deal with the high dimensionality of the original data 
by constructing a two-dimensional map of the contribution matrix, minimizing the loss 
of information in the dimensional reduction, and making it more amenable to further 
investigation. 

2.1. Singular Value Decomposition of the modular structure 

The approach developed here consists in the analysis of C using Singular Value 
Decomposition [9] (SVD). It stands for the factorization of a rectangular iV-by-M real 
(or complex) matrix as follows: 

c = uxv j (2) 

where U is an unitary N-hy-N matrix, S is a diagonal iV-by-M matrix and 
denotes the conjugate transpose of V, an M-by-M unitary matrix. This decomposition 
corresponds to a rotation or reflection around the origin, a non-uniform scale represented 
by the singular values (diagonal elements of S) and (possibly) change in the number 
of dimensions, and finally again a rotation or reflection around the origin. This 
approach and its variants have been extraordinarily successful in many applications [9] , 
in particular for the analysis of relationships between a set of documents and the words 
they contain. In this case, the decomposition yields information between word-word, 
word- document, and document- document semantic associations, the technique is known 
as Latent Semantic Indexing [10], and Latent Semantic Analysis [UJ. Our scenario is 
quite similar to this, where nodes resemble words, and modules resemble documents. 
We devise that a similar approach will help to unravel the relations between nodes' 
contributions and modules of a certain partition. 

2.2. An optimal 2D map of the modular structure of networks 

A practical use of SVD is dimensional reduction approximation, also known as Truncated 
Singular Value Descomposition (TSVD). It consists in keeping only some of the largest 
singular values to produce a least squares optimal, lower rank order approximation (see 
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Appendix). In the following we will consider the best approximation of C by a matrix 
of rank r = 2. 

The main idea is to compute the projection of the contribution of nodes to a 
certain partition (rows of C, namely rii for the i-th node) into the space spanned 
by the first two left singular vectors, the projection space U2 (see Appendix). We 
denote the projected contribution of the i-th node as n«. Given that the transformation 
is information preserving [12J, the map obtained gives an accurate representation of 
the main characteristics of the original data, visualizable and, in principle, easier to 
scrutinize. Note that the approach we propose has essential differences with classical 
pattern recognition techniques based on TSVD such as Principal Components Analysis 
(PCA) or, equivalently, Karhunen-Loeve expansions. Our data (columns of C) can 
not be independently shifted to mean zero without loosing its original meaning, this 
restriction prevents the straightforward application of the mentioned techniques, and 
also differentiates our work from the modern techniques for the analysis of gene 
expression patterns [TBI EH] . 

The main problem when using SVD relies always on the interpretation of its 
outcome. The combination of data in the process makes difficult a direct comparison 
between input and output. To overcome this problem, we point out the following 
geometrical properties of the projection of the rows of C we have defined (see Appendix 
for a mathematical description): 

(i) Every module a has an intrinsic direction e a in the projection space U2 
corresponding to the line of the projection of its internal nodes (those that have links 
exclusively inside the module). We call these directions intramodular projections. 
This property is essential to discern among modules that are cohesive, in the sense 
that the majority of nodes project in this direction, from those modules which are 
not. 

(ii) Every module a has a distinguished direction rh a in the projection space U 2 
corresponding to the vector sum of the contributions of all its nodes. We call these 
directions modular projections. The modular projection is relevant when compared 
to the intramodular projection because their deviations inform about the tendency 
to connect with other modules. Note that e a and m a are equal only if the module 
is disconnected from the rest of the network. 

(iii) Any node contribution projection hi is a linear combination of intramodular 
projections, being the coefficient of each one proportional to the original 
contribution Ci a of links of the node % to each module a. This property comes 
from the linearity of the projection, and expresses the contribution of nodes to the 
modules to which they are connected to. 

Consequently, from (i) and (iii), we can classify nodes. Nodes with only internal links 
have a distance to the origin proportional to its degree (or strength). Nodes with 
internal and external links, separate from the intramodular projection proportionally to 
their contributions to other modules. From (ii) we can classify modules. Modules that 
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Figure 1. Geometrical scheme of the TSVD. The intramodular projection of module 
a, e Q is the direction where all internal nodes lay (in the plot node i). The node 
contribution projections ft are represented by vectors in different colors. Finally, the 
modular projection rh a is computed as the vector sum of all the node contribution 
projections belonging to it. Note that the intramodular projection and the modular 
projection do not coincide, the differences between both inform about the cohesiveness 
of the module. 



have close modular projections are more interrelated. These geometrical facts are the 
key to relate the outcome of TSVD and the original data in our problem, see Fig. HJ 

2. 3. Structure of individual modules 

To study the structure of individual modules we concentrate on the analysis of the 
projection of nodes' contributions in the plane W2. Keeping in mind the geometrical 
properties (i) and (iii) exposed above, we propose to extract structural information 
relative to each module by comparing the map of nodes' contributions to the 
intramodular projection directions. To this end it is convenient to change to polar 
coordinates, where for each node i the radius Ri measures the length of its contribution 
projection vector hi, and 9i the angle between hi and the horizontal axis. We also 
define fa as the absolute distance in angle between hi and the intramodular projection 
e a corresponding to its module a, i.e. fa — \Q% ~ #ej- 

Using these coordinates R— <p we find a way to interpret correctly the map of the 
contribution matrix in W 2 ; i) Rint — -Rcos0 informs about the internal contribution of 
nodes to its corresponding module, as well as to the contribution to its own module by 
connecting to others. To clarify the latter assertion, let us assume a node i belonging 
to a module f3 has connections with the rest of modules in the network. Given that 
this connectivity pattern is a linear combination of intramodular directions e Q , the 
vector sum implies that connecting with modules a having — 9g a \ > vr/2 decreases 
the module R, and vice versa, ii) i? ex t = -Rsin0 informs about the deviation (as the 
orthogonal distance) of each node to the contribution to its own module, see Fig. |2j It 
is also possible to study the spreading of 4> by using other descriptors proposed in the 
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Figure 2. Schematic plot of the coordinates proposed to study the structure of 
individual modules. The relative distance of a node from its module is captured by 
the angle </>. The respective components i?i n t and i? e xt are depicted. 

context of synchronization [15] . 

We explore the internal structure of modules using the values of i?i nt , and the 
boundary structure of modules using R ex t- Using descriptive statistics one can reveal 
and compare the structure of individual modules. Provided that the distribution of 
contributions is not necessarily Gaussian, an exploration in terms of z-scores is not 
convenient. Instead we use box-and- whisker charts for the variables, depicting the 
principal quartiles and the outliers (defined as having a value more than 1.5 IQR lower 
than the first quartile or 1.5 IQR higher than the third quartile, where IQR is the 
Inter- Quartile Range). 

The boxplots for the data of each module in the variable Ri nt allow for a visualization 
of the heterogeneity in the contribution of nodes building their corresponding modules, 
and an objective determination of distinguished nodes on its structure (outliers). 
Consequently, the boxplots in i? ex t inform about the heterogeneity in the boundary 
connectivity. Nodes with links in only one module are not considered in this statistics 
because they do not provide relevant information about the boundaries (they have 
= 0), only nodes that act as bridges between modules are taken into account. 
Considering internal nodes in this statistics would eventually produce a collapse of the 
quartiles to zero. Assuming that every module devotes some external links (otherwise 
they would be disconneted) , the width of the boxes in this plot is proportional to the 
heterogeneity of such efforts. If only one node makes external connections, then the 
boxplot has zero width. Moreover, given two boxes equally wide, their position (median) 
determines which module contributes more to keeping the whole network connected. 

2-4- Interrelations between modules 

The analysis of the interrelations between modules is performed at the coarse grained 
level of its modular projections. The modular projections rh a are aggregated measures 
of the nodes' contribution to their particular module. The normalized scalar product 
of modular projections provide a measure of the interrelations (overlapping) between 
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Figure 3. Schematic plot of the interrelation between the modular projections of 4 
modules. The matrix represents the overlap computed as the scalar product between 
directions. 



different modules. A representation of these data in form of a matrix ordered by the 
values of d^ a reveals the actual skeleton of the network at the topological mesoscale, 
see Fig. [31 

3. Application to synthetic networks 

We start applying the methodology of analysis to synthetic networks, having control 
of the whole network structure. First, we analyze a network built up from cliques of 
different sizes, we consider a line of cliques from size 3 to 10, joined only by a unique 
link between them. We will consider two different partitions to test the method. The 
first partition consists of a module containing the larger clique, and another containing 
the rest of the cliques, see Fig. H^,. In the second partition each clique forms a module, 
see FigHb. The plots Fig. Hb,d (left) show the projections of the nodes' contributions in 
the plane spanned by the two first right singular vectors U2, as well as the intramodular 
projections of each module in this plane. The data in U2 are transformed to polar 
coordinates for a better visualization and simpler analysis, see Fig. Hb,d (right). The 
structure of these plots will be repeated in the next examples. 

Projecting the contribution matrix corresponding to the partition in two modules 
Fig. 0b, we observe clearly the relation in connectivity between nodes and the structure 
of both modules. The two distinguished nodes that connect both modules lay out 
of the intramodular projections, while the rest of nodes lay exactly on this direction. 
The different positions within the intramodular projections correspond to the degree 
of each node, nodes with identical contribution project to the same position. For the 
second partition, Fig 0H, the modules of size 3 to 9, are concentrated around a similar 
direction while the clique of size 10 is separated from the rest. In the plot we have 
zoomed the regions in the R-9 around the directions where nodes project. For every 
module the projection reflects two positions: one exactly on the intramodular direction 
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Figure 4. Optimal map of the modular structure for the optimal partition of the 
cliques network partitioned in two modules (a) and the cliques network partitioned in 
eight modules (b), each color corresponds to a different module of the given partition. 
In (c) and (d) we plot the projected space spanned by the two left singular vectors 
of the TSVD, Ui (left), and its transformation to polar coordinates R-6 (right), for 
each network. Dashed lines mark the directions of intramodular projections of each 
module. In d) right we present a zoom in 9 for better visual inspection. 
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Figure 5. Analysis of a random homogeneous hierarchical network with community 
structure, see text for details, a) Network structure, b) Projection as explained in 
Fig.lH 



corresponding to the internal nodes of the clique and another corresponding to the node 
that acts as a connector with the following clique. The connectors towards the precedent 
clique (of lower size) are indistinguishable at the resolution of the plot, but also lay in 
a different direction. 

Following the test, now we apply the method to a model of network with a well 
defined community structure that has been used as a benchmark for different community 
detection algorithms [5], proposed by Girvan and Newman [3J. In that model the authors 
construct a network of 128 nodes as a set of 4 communities, each one formed by 32 nodes. 
Fixing the mean number of links per node at a value of 16, the parameter describing 
the sharpness of the community distribution is z- in) the average number of links within 
the community. A generalization of this model was proposed in [16] to include several 
hierarchical levels of communities. The hierarchy is defined as follows: we take a set 
of N nodes and divide it into ni groups of equal size; each of these groups is then 
divided into ri2 groups and so on up to a number of steps k which defines the number of 
hierarchical levels of the network. Then we add links to the networks in such a way that 
at each node we assign at random a number of z\ neighbours within its group at the first 
level, Z2 neighbours within the group at the second level and so on. There remains the 
number of links that each node has to the rest of the network; that we will call z out . We 
construct a network with N = 128 nodes, two hierarchical levels with n\ = 2, ri2 = 2, 
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Z\ = 5, z 2 = 10 and z out = 1. Again the method resolves the modular structure and 
individual contributions in the correct way, see Fig. [5j In Appendix D we also test the 
sensitivity and robustness of the method to slight changes in the predefined partition. 

4. Application to real networks 

The first network analyzed is the Zachary's karate club network [17J accounting for the 
study over two years of the friendships between 34 members of a karate club at a US 
university in 1970. The network in question was divided, at the end of the study period, 
in two groups after a dispute between the club's administrator (node 1) and the club's 
instructor (node 34), which ultimately resulted in the instructor leaving and starting a 
new club, taking about half of the original club's members with him. The partition we 
have used in our study corresponds to four modules resulting from optimizing modularity 
[3] using Extremal Optimization [18] and refined with Tabu search [19], providing a value 
of modularity Q = 0.420. After the projection, see Fig. El we observe, nodes 1, 3 in 
the green module and 33, 34 in the blue module clearly distinguished by its value of R, 
denoting their important role in supporting the structure of both modules, however they 
are not the nodes that connect with other modules. It is also remarkable that node 10 
lays half way of the modular directions of the larger modules assessing its unclassifiable 
nature (this node has been persistently misclassified by most of the community detection 
algorithms) . 

The proposed mapping is also applied to two other real networks, the worldwide 
air transportation network, and the AS-P2P Internet network. The airports network 
data set is composed of passenger flights operating in the time period November 1, 
2000, to October 31, 2001 compiled by OAG Worldwide (Downers Grove, IL) and 
analyzed previously by Prof. Amaral's group [8]. It consists of 3618 nodes (airports) and 
14142 links, we used the weighted network in our analysis. Airports corresponding to a 
metropolitan area have been collapsed into one node in the original database. The AS- 
P2P Internet data set considered is composed of autonomous systems (AS) [20] in the 
peer to peer (P2P) category, where two ASs freely exchange traffic between themselves 
and their customers, but do not exchange traffic from or to their providers or other 
peers [21] . We complemented this data set with the geographic localization of the ASs, 
resulting in 1217 nodes and 4058 links. We have optimized modularity [3] to find good 
partitions of the networks in modules. We have used the partition corresponding to 
26 modules and modularity Q = 0.649 for the airports network, and 12 modules and 
Q = 0.387 for the AS-P2P network. Note that any partition, not necessarily the one 
corresponding to optimal modularity, can be analyzed as described. 

The interesting aspect of applying the analysis to these two data sets is twofold: 
first, since both are geo- referenced, it is possible to assign a tag to each module 
corresponding to geographic areas, and second, the modular structure of both networks 
is substantially different, while the airports network evolution has been mainly shaped 
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by two well defined continental blocks (USA and W Europej§|, the AS-P2P network has 
been built in a more homogeneous way. It is very interesting to observe how the AS- 
P2P network, following a sort of "wiring optimization" , presents a community structure 
evenly distributed in areas covering a worldwide belt. 

In Fig. [7Ji,b, we plot the structure of the networks partitioned in modules, these 
conform the original data that compose our contribution matrices. The geographical 
location has been added to the plot for visualization purposes but it has not been 
used in the analysis. The plots Fig. [7b, d (left) show the projections of the nodes' 
contributions following the same structure of the precedent plots. The differences 
between both modular structures has clearly emerged in this projection, the airports 
network is basically polarized in two geographical areas, whereas in the AS-P2P network 
this polarization does not exist. We also see how different airports and ASs excel in 
their values of R largely over the rest. This effect can be further developed by studying 

§ We denote N-S-E-W for the four cardinal points North, South, East and West respectively. 
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Figure 7. Optimal map of the modular structure for the optimal partition of the 
airports network (a) and the AS-P2P network (b), each color corresponds to a different 
module of the given partition. In (c) and (d) we plot the projected space spanned 
by the two left singular vectors of the TSVD, Ui (left), and its transformation to 
polar coordinates R-9 (right), for each network. Dashed lines mark the directions of 
intramodular projections of each module. Nodes whose contribution is totally internal 
to a module project exactly on its corresponding dashed line. In the R-9 plot we have 
labelled certain distinguished nodes that also correspond to very important airports 
and ASs in the world. For the airports network we have magnified the area over 10 _1 
to identify the more important nodes in R. The loss of information associated to 
the two-dimensional projection is 18.2% for the airports network and 15.8% for the 
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Figure 8. Box-and-whisker plots of R[ n t and i? e xt respectively, for the two networks depicted in Fig. [7J Modules are sorted according to 
medians in increasing order. We label the horizontal axis using names for the modules assigned according to the geographical location of 
at least the 75% of their nodes. We highlight whiskers and outliers in both networks. Only those modules whose structure is significant 
(more than 10 nodes) are represented in the plot. 
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The structure of modules is scrutinized in Fig. |HJ where we depict the box-and- 
whisker plots of the internal contributions Ri nt and external contributions -R ex t- The 
results show the heterogeneity of each module of the partition. Remarkably, the method 
reveals outliers distinguished by their capability to support the internal structure of 
modules and also to cross-connect them. In Fig. (top), we observe that USA and W 
Europe modules have medians greater than the percentiles-75 of the rest of modules. 
This fact is pointing out the extreme internal cohesion of both sites. We also observe 
that the lowest value in R mt median corresponds to Alaska, however Anchorage leads the 
internal cohesion orders of magnitude beyond the core. In Fig. [B^i (bottom) Canada, W 
Europe an C America provide the highest profile of boundary connectivity. Nevertheless, 
the role played by USA is still very significant because of its high percentiles and outliers. 
On the other side, Africa, Russia and China are less connected to the world than the 
rest of modules. For the AS-P2P the box-and- whisker plots in R mt Fig. [8b (top) inform 
about a slight dominance of 3 modules E Europe, W Europe and the module containing 
USA and Japan. Here E Europe does not correspond to the political area but to a tag 
we use to represent a geographical area that is more oriental than the western, denoted 
as W Europe. In R ext Fig. [8b (bottom) the similarity in range and medians reveals the 
homogeneity of the mesoscale of this network. Significantly, some highlighted ASs in 
the plot do not belong geographically to the assigned tag, although the main proportion 
of nodes in that module do (see E Europe, W Europe and Russia). 

Finally, we plot the interrelations between modules in Fig. [9]by computing the scalar 
product of their respective modular projections. The labels of the matrix are chosen in 
decreasing order of modular projection's angle 6m a - F° r the airports network (Fig. [9^) 
we observe a clearly polarized structure in two main blocks, with a more diffuse central 
part overlapping both (corresponding to the communities mainly composed by nodes in 
Canada, Central America, Japan and South America). Japan is especially interesting for 
it maintains no preference in overlapping with any specific module in the network. In the 
AS-P2P network (Fig. [9b ) we observe four groups, where neighbors in the analysis are 
in accordance with geographical neighbors. We remark that geographical information 
is not included in any part of the analysis, it simply emerges from the projection of the 
contribution matrix. The geographical correlation in the AS-P2P network could surprise 
given that communities of use in P2P networks are related to contents or topics, however 
many AS have to pay to other ASs to provide the connection between peers and then 
geopolitical constraints are revealed. 

5. Conclusions 

Summarizing, we have reformulated the analysis of the modular structure first, defining 
the object that contains all this information, and second we apply Singular Value 
Decomposition (SVD) on this object. Dimensional reduction follows in a natural way 
from the properties of the truncation of SVD, in particular we concentrate on the 
truncation of rank 2, with the idea of having a map of the modular structure amenable for 
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Figure 9. Overlap matrices between the modules composing the topological mesoscale 
of the networks plotted in Fig. [7J Each matrix corresponds to the normalized scalar 
product of the individual modular projections (see text for details). Modules are sorted 
by decreasing order of modular projection's angle in the plane Ui- 
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analysis to any scientist. The approach is very simple and can be understood using basic 
algebra notions. The computational implementation is also affordable given the multiple 
software packages that include an automatic SVD (R and Matlab among others). The 
result is a formalism to study the skeleton of networks at the modular level. The 
most important problem we have faced in the current research was the interpretation 
of the outcome in terms of the original data. We have made a breakthrough on this 
interpretation by focusing our attention in the particular resulting geometry of the 
projected contribution of nodes. We also present a statistical analysis of the resulting 
map using Box-and- Whisker plots based on percentiles, more appropriate than the use 
of z-scores that must assume a Gaussian distribution of values. Finally, we find the map 
of interrelations of the modular skeleton. 

The method proposed might be very useful for scholars in different disciplines 
that want access to an easy and tractable map of the empirical complex network 
data according to a biological, functional or topological partitions. We devise that the 
analysis of this map will be very helpful to anticipate the scope of dynamic emergent 
phenomena that depends on the structure and relations between modules. Spreading of 
viruses or synchronization processes are natural candidates to be analyzed considering 
the organization of the map. Moreover, we devise that the method can be used to graph 
bipartitioning by adaptively changing nodes between two modules while maximizing 
the angle in the R — 9 plane between them. Further studies of the similarities between 
nodes' contribution projections can also help to classify networks according to the role 
profiles of nodes [22] and/or modules. 
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Appendix A. Properties of TSVD 

Let us assume that we preserve only the r largest singular values and neglect the 
remaining substituting their value by zero, then the reduced matrix C r = U"E r V^ 
has several mathe matical prop erties worth to mention: first, it minimizes the Frobenius 



norm (||A||.f = y trace (Aj4J)) of the difference \\C — C r ||i?, that means that among 
all possible matrices of rank r, C r is the best approximation in a least squares sense; 
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second, C r is also the best approximation in the sense of statistics, it maintains the most 
significant information portion of the original matrix [12]. The left and right singular 
vectors (from matrices U and V respectively) capture invariant distributions of values of 
the contribution of nodes to the different modules. In particular the larger the singular 
value the more information represented by their corresponding left and right singular 
vectors. We have used the LAPACK-based implementation of SVD in MATLAB. We 
warn that some numerical implementations of SVD suffer from a sign indeterminacy, in 
particular the one provided by MATLAB is such that the first singular vectors from an 
all-positive matrix always have all-negative elements, whose sign obviously should be 
switched to positive [23] . 



Appendix B. Projection using TSVD of rank 2 

In the case of a rank r = 2 approximation, the unicity of the two-ranked decomposition 
is ensured (9] if the ordered singular values Cj of the matrix S, satisfy o~\ > o"2 > o"3- This 
dimensional reduction is particularly interesting to depict results in a two-dimensional 
plot for visualization purposes. In the new space there are two different sets of singular 
vectors: the left singular vectors (columns of matrix U), and the right singular vectors 
(rows of matrix V^). Given that we truncate at r = 2, we fix our analysis on the two 
first columns of U, we call this the projection space U.2- The coordinates hi of the 
projection of the contributions rij of node % are computed as follows: 

h t = T, 2 - l V^ ni (B.l) 

Here S2 1 denotes the pseudo-inverse of the diagonal rectangular matrix S2 (singular 
values matrix truncated in 2 rows), simply obtained by inverting the values of the 
diagonal elements. It is possible to assess the loss of information of this projection 
compared to the initial data by computing the relative difference between the Frobenius 
norms: 

M r 

2 _2 



E<*-E 



o-- 



T? \\C\\f - \\C r \\F g=l q=. / R 9 x 

r = = m ( R2 ) 

a=l 

Appendix C. Geometrical properties of the projection of C 

The intramodular projection e a corresponding to module a, is defined as the projection 
of the cartesian unit vector e a — (0, ... ,0,1,0, ... ,0) (the a-th component is 1, the rest 
are zero), i.e. 

e a = S a - 1 V^e a (C.l) 
Any node in the original contribution matrix can be represented as 

M 

n i = E Cia e a (C-2) 
a=l 
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Figure Dl. Robustness of the method to noise in the partition. We show the 
separation from the intramodular directions of modules 1 to 4 (top to down) of all 
nodes, in particular we track the deviation of the nodes when some of them have been 
assigned to the incorrect module. The nodes that have been moved are those that 
deviate more from the intramodular projection of module 2. 



Its projection gives the node contribution projection 

M M 

fii = Y. CU^-V^) = C i«e* ( C - 3 ) 

a=l a=l 

a linear combination of intramodular projections. In particular, a node i whose 
contribution is totally internal to a module a is projected as hi = kie a , where fcj is 
the node degree. The modular projections rh a are computed as the vector sum of all 
the projections of nodes contributions, for those nodes belonging to module a, i.e. 

N 

fh a = ^S ia hi (C.4) 

8=1 

Appendix D. Effect of noise on C 

The method presented is pretty robust to perturbations in the partition or, equivalently, 
in the contribution matrix C. To support the claim we make the following experiment: 
using the benchmark network proposed by Newman and Girvan [TJ, see section 3, with 
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128 nodes, z in = 15 and z out = 1, we perform slight changes in the predefined partition, 
by moving nodes from module 1 to module 2. First we move only one node, then 
two nodes, and finally 8 nodes. This changes matrix C, which must in turn affect 
TSVD output. Fig. ID1I contains the nodes' projection as the mentioned movements 
take place (squares, triangles and diamonds respectively). Consistently, module l's 
nodes projections progressively decrease in R. Module 2 balances this fact, it retains 
the weight leaving from module 1. Sensitivity to inter- modular connections is also 
evidenced: when a single new node appears in module 2 (Fig. ID1|. squares), 4>i has an 
outstanding value if compared to the rest; this is also evident when two nodes enter group 
2 (Fig. IDll triangles). When moving 8 nodes, the effect is less drastic for the deviations 
in 9 and more drastic in R. Unsurprisingly, modules 3 and 4 remain mostly unchanged, 
the interplay between modules 1 and 2 (nodes leaving from one group towards the 
other) does not drastically affect their internal characteristics, nor their importance in 
the whole structure. 
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